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ABSTRACT 


Considered within this paper is the problem of minimi- 
zation of a function of unconstrained variables. A wide 
variety of solutions to this problem is presented and the 
possible advantages of each method are discussed. For the 
purpose of this paper these techniques are divided into 
four broad categories: general search directions; conjugate 
Search directions; Cauchy's Steepest Descent and Newton's 


method; and variable metric methods. 





TABLE OP CONTENTS 


I. INTRODUCTION ~~~=220---- 220022222222 222222222 2-- 4 
I. GENERAL SEARCH DIRECTIONS ---------------------- 6 
A. GRID METHOD ---------------------------------- 6 
B. ALTERNATING VARIABLE METHOD ---------------- 7 
C. METHOD OF HOOKE AND JEEVES ----------------- 9 
D. SIMPLEX METHOD: NELDER AND MEADE ---------- 11 
E. ROSENBROCK'S METHOD ------------------------ 1S 
F. DAVIES, SWANN AND CAMPEY ------------------- 18 
G. eeMATRIX “ESTIMATOR ---------4-----ee----52-5-- 19 
III. CONJUGATE SEARCH DIRECTIONS ---------- (eee eeee 24 
ne ONE Sie ir ea ee 25 
B. REVISED CONJUGATE DIRECTIONS BY POWELL ----- oh 
C. ZANGWILL'S METHOD -------------------------- oS 
eV . STEEPEST DESCENT AND NEWTON'S METHOD ----------- 39 
A. STEEPEST DESCENT --------------------------- 39 
B. NEWTON'S METHOD ----+----------------------- 41 
C. MODIFIED NEWTON'S METHOD ------------------- 43 
Vv. aS eee Eee = eee - 47 
A. DAVIDON, FLETCHER AND POWELL --------------- 47 
B. MURTAGH AND SARGENT ------------------------ 55 


Gy PEARSON'S CLASS OF VARIABLE METRIC METHODS - 64 


pel CONCLUSIONS ----- -Se.. . ewes oe See = 2 69 
APPENDIX A: LINEAR SEARCH TECHNIQUES ----------------- ial 
BIBLIOGRAPHY ----------- ee a ne 78 
DOCEE MMIC UUMTON LIST 2----~2----0S8b---.---..922.- 80 
ROmmonr 1473 ------—-——- pane = Pe cee ee 22, 0 81 





I. INTRODUCTION 


iewsproblem under consideration in this paper is that 
of minimizing f, a function of n unconstrained variables. 
This is a problem that arises frequently in many widely 
meaed fields. In general, it may be more likely to find 
feat the variables of the function have certain constraints 
placed upon them. While this problem is conceptually more 
mouvied 1 1s felt that the key to its solution. lies in 
eee solution of the unconstrained problem. Thus a great 
geal of attention has been given to this latter problem. 
bees Belt: that 1f the unconstrained problem can be solved 
then its method of solution can be applied to the con- 
memained problem, bv the technique of adjoining penalty 
mametions corresponding to the constraints. 

The minimization of a function is certainly not a new 
problem to mathematics. Famous scientists such as Cauchy 
and Newton long ago devised methods of solution. In fact, 
their methods remain useful today ae etal be two of those 
@escussed in this coe Hemever, tie functions to be 
minimized have become more and more complex. The classical 
methods of Cauchy and Newton are often found to be inade- 
quate. 

Thus since the late 1950's there has been a great deal 
mee nescarcieim this field. The key to this surge has been 
mc econputen. Ditticult and complex methods of solution 


would be useless without the computer; but its availability 





has allowed the development of many new ingenious minimiza- 
tion techniques which would have previously been impossible 
fo implement. 

With this new research, one fact has become apparent. 
No one method seems to be the most efficient for every 
type of function. Most methods have certain characteristics 
which make them more useful in some cases than others. 
mueretore, apparently there is no simple solution to the 
minimization problem. 

Thus, the purpose of this paper is to present a wide 
selection of these new and old methods of solution. Their 
relative advantages and disadvantages will be discussed in 
mae hope that the most efficient method can be selected for 
the function to be minimized. Unfortunately the complexity 
of this field and the lack of sufficient time has not per- 
mitted the author to program many of the methods to be 
discussed. Since this is important to the rating of the 
relative efficiencies of these methods, the findings of 
Other researchers ane used and referenced to allow fur- 


ther in depth study of specific problems. 





ie GENERALS oOEARCH DIRECTIONS 


A GRID METHOD [9] 

As we go from the first to the fourth category we go 
to more and more sophisticated methods. The first method 
meauite simple, but its applicability is very limited. 
Its advantage is the ease with which it can be programmed. 

This method, to be useful, requires that there be 
some knowledge of the location of the minimum. Therefore, 
assume that the minimum, X = (Xy5---X); is known to lie 


maetnin 2 certain rectangular region defined as follows: 


i waich a. ana b. are known. 
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where rs 1s a positive integer. 

The function is evaluated at each point of this grid 
and the smallest value is taken as the minimum of the func- 
meen. The difficulties with this method are obvious. To 
obtain a good approximation to the minimum it would be 
necessary to make each S5 ema) ."' But adecrediceanipeeiie 
eee Ze. 0.f S5 increases the number of points at which the 
function must be evaluated. The number of evaluations 


needed is M = ee tet) (rl). tf n is large then 








M can be so great that the method would require far too 


many evaluations for it to be useful. 


BD . ALTERNATING VARIABLE METHOD 

ieee sc meerod a basic teechnraque is introduced that 
will be employed in most of the methods that follow. A 
point and a direction are chosen in some way, and the mini- 
mum value is sought along the resulting line. A direction 
is a vector in n-space which is usually represented by a 
column matrix. At this time it will be assumed that, 
given a straight line, the point on that line at which the 
function has a minimum value can be determined. Since 
fms 1s a secondary problem and its solution is rather 
icitementaryent will be dealt with later, im Appendix A. 
Included there are some techniques for treating the basic 
problem. 

This minimization process is characterized by the use 
ee pcrmanently fixed search direeteons. Gemerally, the 
Meeections are chosen parallel to the various co-ordinate 
@mes. Each variable in turn is changed or perturbed and 
Meecarch Carried out so as to minimize the function on 
that line. - The effect, as shown below, is that of a 
staircase, in which the steps decrease in size near the 
minimum, if the function is a quadratic. 

In n dimensions, a function whose level surfaces were 
hyperspheres would be minimized in n searches. But if a 


MMe lToOnRIsemeteor this nicewmature then certain ditficult 
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pe@biems can arise. For practical use there must be some 
means of determining when the process should be halted. 
‘There are Many siem Tues, called Convergence Glluciata, 
which can be used. One of these criteria is based upon 
j@me Ghange in the value of the function over one itera- 
tion [9]. It has been suggested for some methods that if 
this change is less than some e then the process should be 
terminated. But one of the problems that may be encoun- 
tered in the Alternating Variable Method is that if the 
pranciple axis of the function is not aligned, at least 
approximately atch Olle OL thnemee -O1rdinatewa xcs mpm oo rcss 
along each search direction may be very small. [In this 
case the choice of the convergence criterion discussed 
above may lead to a halt of the process well before the 
actual minimum is reached. Since many functions arising 


imepwoblems: are not of the "nice" hyperspherical nature, 





ehie Alternating Variable Search Method is often too sim- 


mlestor goodsresults, [9]. 


o METHOD OF HOOKE AND JEEVES 

thevitethodsamvented by Hooke and Jeeves attempts fo 
mmprove the inflexible search routine discussed above. 
m> co this the concepts of exploratory moves and pattern 
moves are introduced [3,9]. 

ine exploratory process mesemples the alternacine 
Wartable search technique in that it uses the co-ordinate 
g@arections to search along. However, it is not assumed 
that the minimum along each line can be found. Instead 
Xx. 1s perturbed by an amount d. while the other variables 
Peeeeneld fixed. If the functional value is decreased 
femee this step then the new point replaces the previous 
om and the next variable is considered. If the function 
Mmeenot decreased then the original Xs LS perturbed —by, -d,, 
and again the functional values are compared. This new 
point may or may not replace the old one, but in either 
Saoc the next wariable is then considered. One cycle is 
Eomplete when all the variables in turn have been per- 
mired . 

ie mext Step taken is what Hooke and Jeeves calla 
“pattern” move, which is made from the last point arrived 
mecuring Che exploratory phase. Let us call this last 
point a, and let ap be the point at which the cycle started. 


The pattern move will then be to the point Za_-a The purpose 


0 





Ge thivs*is to make another move in the general direction 
Ge eotal progress made during the previous cycle. From 
this point a new aan of exploratory moves is performed 
and one functional value at the last point is compared 
with the value of the function at a: The entire process 
is then repeated from the point which had the smaller 
functional value. 

This is continued until no progress is made during a 
cycle of exploratory moves, which may indicate that the 
present point is within d, of the minimum or it may be 
Poet the minimum point lies in a steep skew valley. For 
further progress then, d. misc bie reduced “beLrore tne mone - 
fees is continued. When d. becomes fess then Some spcer, 
Ped €¢ bt 1S assumed that the opemation has converged. 

Heslow Fate Of Convergence 15 often a very real prob- 
Memewith this method. The choice of d. 1S critical. Jae 
ee initial point is far from the minimum and d. PS Wee 
fevely small then the process would be very time consuming 
with a great number of functional evaluations needed. 

Even with its disadvantages, this method is the first 
feeample of a principle which will be applied over and over 
Pater in this paper. The method attempts to use past in- 
formation that has been obtained about the function. Thus 
Bae process calls for a move in the direction of progress 
during the exploratory moves which were made to indicate 
tne cenemal, local Pre NaRON Chev iumet lon. lhe, 1neoenpera- 


tion of any previous knowledge obtained about the function 
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iS an important characteristic. of most of the more involved 


minimization methods. 


D. SIMPLEX METHOD: NELDER AND MEADE (1965) 

in Girsemetnod a simplex is defined: this 15. ini- 
eeelly @ comfiguration of n+l equally spaced points in n 
mm@ece; Lor example an equilateral triangle in two space. 

The method presented by Nelder and Meade was designed 
Someiiminate’ some of the problems that arose in early sim- 
plex methods [3,9]. Unlike some of these other methods 
this one does not require that the simplex remain equilat- 
onal. With this greater flexibility in shape it may pos- 
Sibly be easier for the method to follow the contours of 
the function and thus not be obstructed by something such 
as a steep skew valley. 

The first step is to evaluate the function at each of 
Biemvertices of the simplex. The vertex at which the func- 
tion is maxXimun, Vi> is then reflected through the cen- 


emeid, C, of the other vertices. 


V 
new 


I 


Cito TG avy 


or 
eee Cal 
Hee new 
meeo-!l €his method is simply one of the earlier methods in 
which the simplex remains equilateral. 
Mavchmenrc cUunctilon 15 evaluated at Vanes leas eve eure 


Pmeonpared with the values at the other vertices. TVhere 


ane foun dirterent cases which must be considered. 
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. aest assume that flag 1s less than the pre- 
vious second largest value of the function at a vertex but 
ibarger than the value at some other vertex. Since the 
point at which the function was largest was the point fait 
was reflected this case implies that the second largest 
value of the function has become the largest. Thus Vessae 
replaces V, ama the process continues. 

11. Second, assume eV eee) is less than the value of 
eee tunction at all the vertices. This indicates that a 
direction has been found along which the function can be 
meeatly reduced. Thus it might be wise to investigate 
this direction, further, which would not be the case if 
memaition i. held. To take a step further in this direéc- 
mone thnererore calls “for the use of @n expansicen coerr:- 
Grent y > l. 


Define: 


Vo = yV 


5 (1-y)C 


new ~ 
iitesresult of #his process is to define a new point We 
which is on the same line with eas but is farther from 
mime centroid. If the value of the function at V3 is less 


than £(V__.) then V_ replaces V,. Otherwise the point 
new e i 


is Gen lweecs Vi: In either case the process is then con- 
mated. 
ied ‘Third, assume EN) mye) pte HQ) 1S esd 


Pacater thansthe values of the function at the other ver- 


mces. lhe could indicate that there is a relative 


RZ 





minimum somewhere between ee and Vi: Thus it may be 
@esirable not to go quite as far as called for by the re- 
Election. This can be accomplished through the use of a 


contraction coefficient 8B < l. 


Let us take: 


Ve = BV ew + (1=6)G, whemes0"< 8 < 1. 


Again the function is evaluated at this new point and this 
value is compared with the values of the function at the 

memwer vertices. If the result is still a maximum then the 
contraction is considered a failluperand- a diitewemtmse rac 


Poy must be used. Otherwise Me Beplaces V seanGdmerne ploccce 


i 
continues. 

| Aetailure in the»contraction could indi catesrnaesrne 
Simplex has entered a steep skew valley or that a minimum 
is being approached. In either case the size of the sim- 
Mmeex must be reduced so that further progress may be made. 
Mee natural way of doing this is to cut by one half the 
distance of each vertex from the vertex at which the func- 
tion was a minimum. Thus the simplex is shifted toward 
What should be a more favorable area. After doing this 
maeeretlectien process is again continued. 

In this case the 


mye FPOUTEn, assume f(V Ven y: 


ie 


metlected direction does not seem very favorable so it is 


new 


mejected. Again a reduction in the size of the simplex is 
@aliled for before the procedure 1S COntinwed. 

The following two criteria could be used to halt the 
process: 


eS 





i. Let 


nN 


mer | Gea - #2 
—_ >. a 
i=] 


be the standard deviation. Then if s is less than some 
specified number, convergence might be assumed. Ina 
steep skew valley this criterion could cause a premature 
halt and thus if this problem is feared the following 
criterion should be employed. 

gi. Calculate s after each k function evaluataons. 
Convergence would be assumed if successive s's were less 
than some specified number and the difference betiecn two 
successive f's was tee than some small number. 

Tnis Stmplex meaiod is best swmited fer prcblems am 
which the number of variables is small. The process is 
relatively slow and with a-large number of variables the 
required computer time could become intolerably large. 
One technique that should probably be checked is whether 
it might not be better to reflect through the centroid 
of some of the vertices with smaller function values 
/mwener than the centroid of all the vertices. Also in 
the expansion phase of the process it would probably be 
better to continue expanding until a failure is achieved. 
Since this direction is favorable why should only one ex- 
pansion be attempted? Since this expansion would result 
in a new point that may be quite a aie canes ZrOM vilcmees 


mainder of the simplex it would then be wise to reduce the 


14 


momwamces Gf the other vertices from this point. But then 
this has the effect of shifting the simplex toward what 


should be amore favorable area. 


fe: ROSENBROCK'S METHOD 

This method, devised by Rosenbrock, is a rather ob- 
vious development from the method of Hooke and Jeeves 
fmercussed @arlier [3, 14]. The process 1s ustially started 
by using the co-ordinate directions as the first search 
directions but, in general, any set of n mutually ortho- 
normal direction vectors could be used. As with the method 
of Hooke and Jeeves, each direction is considered individual- 
ivewith a step of length d. taken along it. if the wwaine 
of the function at this new point is less than or equal to 
mae value at the original point then the step is termed a 
smecess. Otherwise it is considered:a failure. If a suc- 
cess had resulted then d. 1S) Multiplied by -Some ca. 2a wes 
mimes result was a failure then d. es MUlEiplieds by er, 
meme UL. «6c dim) GCither case the next search direction is 
then investigated. This procedure is continued until a 
success and a failure have been obtained in each direction. 
mimes Constitutes the end of one stage. 

After each stage is completed new search directions 
must be defined. This is done through the use of the 


following vectors: 
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where ED, awmit veetCoay is: the k'th search direction in 
Ghe 3 "th stage; Ay Ls the sum of all the steps taken in the 
@rrection of ED. From thisedefinition it 15 Pe@adi ly “ap- 
mencnt that a, is the total. progress made during that stage; 


aes the total progress: made in all the directions other 


2 
mream the first, etc. An important property of these vec- 
Mers 1s that they are linearly independent. This property 
mesults from the choice of. the definition for a success. 
The linear independence property of the a.'s would be lost 
ai at any time there was no progress made along one of the 
search directions during a stage. At first this seems to 


be a possibility since it could happen that a stage is 


eearted at a point that minimizes the function along one 


=a 
Cu 


@f the searc iprecenons. Butt albowing Gea ltri re the 
definition of a success eliminates this problem. During 
jae process» the step size along such a direction would be 
meamiecd to such an extent that, for computer use, the value 
@e the function at these two points would be the same and 
fuus a success is obtained. While this step may get very, 
very small it will still be ditferent fromezerovand time 
some progress is always made in every direction. 


The mew directions are defined as follows by the Gram 


Schmidt orthogonalization process. 


k-1 “ . 
a DG) 
21 


and 


Se Saal | = 


GS 





Thus these new vectors form a set of n mutually orthonormal 
search directions. 

Were are Various Criteria that could be used to step 
this process. A limit could be set upon the number of 
function evaluations to be made during the process. This 
obviously may halt the method well before the minimum is 
meccied, but it is helpful in avoiding the use of too much 
computer time. The process could also be terminated if 
|{a,|| is smaller than some given number [3]. This would 
mean that the progress made during one stage was very small. 
mhis seems to be a quite natural stopping criterion, for 
surely as the minimum is approached the oueeas made will 
Met less and less. Unfortunately this could also be the 
mmmracteristic fer a seeep skew valley. If theme is a wes- 
Si@bility that the function has this property then great care 
must be taken to avoid a premature halt “im thespre@ecos.. 94 
third criterion for convergence could be |ja,|/|a,]| > .3. 
mits should be used only if the d.'s are Scaled. tomitawe 
fmeitlar magnitudes [3]. The reasoning behind this crite- 
mmon is that {a,|/|a,| > .3 indicates that the direction 
of total progress is rapidly changing which is again a char- 
meeeeristic trait of the function in the vicinity of the min- 
imom. this rapid change could also be present early in the 
-process so this convergence criterion should be applied 
only after a number of stages have been completed. 

There is a close relationship ietneen Che pattern move 


devised by Hooke and Jeeves and the Ed defined above. They 


iy 





one sboth ime theedirection of total progress made duming 
one stage. Rosenbrock's method is far superior, though, 
because of its complete use of the knowledge gained about 
the function as is evident in the generation of the new 
Semmeen directnons [3,9]. thas has the property of align- 
ang the search directions with the principle axis of the 
imme tion. 

The ease with which this method can be adapted to 
computer use and its relative stability has shown it to be 


@ne of the most useful of the direct search procedures. 


Pe. DAVIES, SWANN AND CAMPEY (1964) 

this method is a further refinement of the useful 
memenod invented by Rosenbrock [3,9]. It attempts to re- 
move the restriction of a fixed step length. As with 
feeenbrock's method, the search directions are n mutually 
Smetnonormal vectors chosen initially as parallel to the 
@oerordinate axes. In this case, though, it will be as- 
sumed that, within a certain degree of accuracy, the mini- 
mum along each search direction can be found. This 
assumption introduces one difficulty that Rosenbrock's 
method does not have. 

it May not be possible to make progress along a cer- 
tain search direction and thus the vectors, as, defined in 
Rosenbrock's method, would not be linearly independent. 
Assume that there can be formed (n-m) linearly independent 
vectors. These vectors are orthogonalized as described 


previously and become (n-m) search directions. The other 
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medirectionms needed are those along which no progress was 
made during the previous cycle. Since none of the new 

mec tomes hame Gomponents in theedireetiron of these m vee- 
tors and the m vectors were already orthonormal it is 
evident that again the process is begun with n mutually 
Srachnonormal vectors. Convergence criteria similar to those 
suggested with Rosenbrock's method can be applied to this 
method. 

The assumption that was made in regard to minimization 
meng a line introduces an added factor that must be con- 
Sidered when choosing between this method and the one de- 
vised by Rosenbrock. Depending upon the technique used 
eeo Obtain this minimum a larger number of function evalua- 
mmens might be needed. Thus, if this latest methoc dae 
not significantly increase the rate of convergence the 
Boa@rt1Onal evaluations required might indicate that 
Rosenbrock's method is better in that case. Also when far 
fpem the minimum there is no real advantage to obtaining 
the exact minimum along any specific direction. Thus again 
mae fixed step length may have an advantage because of the 
less time FeGuired. In general, though, thwesemechod ai. 


ween Lound superior to the method of Rosenbrock [3,9]. 


i. MATRIX ESTIMATOR 
In this section it will be assumed implicitly that the 
function may be approximated by a quadratic, at least in 


some region. A method is developed for determining the 


ALS 





Geetiticwemts and using these to estimate the point where f 
eErects rts minimum [6]. 
Let us now consider the case in which the function f 


ms @ Quadratic in the form: 


Mme = sxeunx + D'X +c 


Mine 


‘Under these conditions if the matrix A is known then 


where x!' = (Xpoe ee X, 


it may be shown that the minimization problem is easy to 
solve, as follows. Let x be the point where the minimum 


Meeurs and consider: 


VE (x) 


bo + oe 
and 


Viale) 0 = b + Ax. 


iemeaese are subtracted we get: 


VE(x) - V£(X) = Ax - AX 


whence 


X=x- A VE(x). (1-1) 


Thus (1-1) can be applied to find x if A and V£(x) can be 
Sercrmined. The matrix A can be found as follows: 
Consider a sequence of points xt = xX ee where 


By aKa, and us is selected to minimize f along the line 


@erined by Xt and dd. iimen 
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£(xS*) = c + b'(xS+p) + 4(xS+p,) ACK +p) 
k k k 
eh oa) es 
Pi mBa | 2P,. Py 
= £(xX) + vitXp, + ptap,/2 ee 
Py * PyAP,/2 . (1-2) 
£(xX) = c + bt (xk*t ply + cx lp yacht p,) 
= £(x071) - yrek*hy 4 uptap, /2 des 
Peek 1 os ‘eyo 
put V'f oe because f was minimized along dy). 
mence, 
f(x“*1) = £(x%) - ptAap,/2. (1-4) 


mies last relation is very useful in determining the ele- 
ments of the matrix A, as follows. 


Let us define es to’ be a column vector thar 1597 me 
Beecept for a one in the kth position and choose dy = ane 


Por this choice of dy equation (1-4) reduces to: 


(1-5) 


k+1 k 
eae ie we (xs) K =f... see 


a 
k K+ lee 
Coy? _ 
Where ass > 1 = ]1,.:.,n are the diagonal elements of A. The 
above equation thus generates the diagonal elements of A 


by the use of function values only. 


Homobearme the off-diagonal elements let usSedenimic. 


=e? +e) for i = lowe. ,neals ) Gai. 0: 


ae. 
1J 
Then let A7*2J be the scalar that minimizes f along or 


This reduces equation (1-4) to the form: 


ZA 





alga b 


fauee eC) Oe aan = 0-240. 
pera, Fe 


ll 
eb) 


(1-6) 


mo Wr. oe en 1 
and 

jt = ee. Seer. 

Equations (1-5) and (1-6) thus completely define the 
maeerix A. This procedure is accomplished by minimization 
along n(ntl)/2 directions. If it happens that one of the 
A's is zero then it is assigned the value of a very small 
but nonzero constant and equations (1-5) and (1-6) are 
ised to generate approximations to as. and ah After A 


has been found the problem remaining is the determination 


moe Vi(x). 
For the choice of d = ao 
k-1 
" n 
of (x ae 
ai oe (xt oma hy - x3) (1-7) 
1 = 
mmr 1S quadratic. 
But etx) / ox. =e: x? = x? for j= Lies ane x7 = el 
1 J J J J 
mj) = i+l,...,n by our choice of d,. The above condi- 
tions reduces (iSite the £0 /iierane : 
n 
n 
afCx") 2D, a G8 - x9). (1-8) 
i j=iti - J J 


Complete knowledge of A thus enables the calculation of 
mae gradient at any point in the above sequence. A de- 
melomment very similar to that used to derive (1-1) pro- 


duces the following results: 
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VE(y) = VE(x") + A(y-x”) (1-9) 


x= y -a7tVE(y) (1-10) 
In general, of course, most functions of interest are 
not quadratic. But since in the neighborhood of the mini- 
mum of the function we will assume generally they closely 
approximate a quadratic, it may be possible to adapt the 
mere procedure to an iterative process. Each n(n+tl)/2 
searches would produce a new approximation to A. By using 
this approximation in equations (1-9) and (1-10) it may be 
eeesible to approach the minimum. Unfortunately there is 
no guarantee that far from the minimum this method would 
Dmoauce a good or even useful approximation to the matrix 
_eiacretore if only function values are to be usedeie 
would probably be best to employ one of the other methods, 
Suen as Rosenbrock's, until it is felt that the process has 
meached a point that is reasonably close to the minimum. 
mMereching over to this latter method at this time could 
possibly be very valuable because of its exact nature for 
Omedratic functions. Of course, in the use of this method 
imemust be realized that more storage space will be required 
Seene computer, since the matrix A must be stored. And 
emmee each cycle of this method requires n(nt1)}/2 searches 
meener than the previously used n directions, significant 


Meogress must be made at each stage to warrant its use. 
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moe. “GONIUGATE SEARCH DIRECTIONS 


fimebismehapter«we will consider some methods. baged 
om the idea»of»conjugate directions. 


Let us again consider a quadratic function of the form: 


fieeee= Co + ob! xh oe 
Two directions d. and ae are called conjugate with respect 


mo A if 
oie = 0, EOYs a Fae. 


Gemaugate directions play a significant role in recent 


G@evelopments in minimization theory, as shall be seen in 


‘the following discussion. Assume that d,,...,d_ are n 
- , 
metwally conjugate directions. Let xX = x? = i241 A*d. 
a : 
where the X*'s are selected to minimize £(x° + ky odee 


me shall see that the resulting point furnishes the desired 
Samemiun to f£. 


Sonsider: 


1 0 a 1 ' 0 nh i ' 0 _ i 
f(x) ix +224 r d. ) A(x +524 r d.) tomas +o r d.)+C 


n 
0 1.) 241 1 0 
fax eee i2y (a\ 5d. Ad. + ds di (Ax ~D jee 


Z n : 
meretore to select the i's tO Maman zZert (x +524 rA*d.) is the 
Same problem as selecting each .* to minimize (4(A") ?dtad, 
+ Ad! (Ax? + b)). Therefore the choice of each Mis inde 


pendent of every other \/, j # i. 
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ifs inpliesm@that if n conjugate directions arém@used 
maeus suttacrvemt toemamamize onee ahong each directionsto 
Setain the mimimum value of the function. Since any ar- 
bitrarily chosen n linearly independent vectors usually do 
mot have this property the advantage in using conjugate 
directions is clearly evident. Let us now take up methods 
mirmch in one way Or another make use of searches in con- 


jugate directions. 


A. POWELL'S METHOD 

This method depends upon the following manner of gen- 
erating conjugate directions. Assume that the function is 
Meo-.tive definite quadratic. Let us pick a direction d,, 
Maaeecwo points x° and x? such that x°-x* is not a multiple 


of ay Let us define: 


eee eet cic 


Meee x ad Ride 


where a and 8 are chosen such that f£(x°+ad,) and f(x*+8d,) 


eeemtche minimum values on their respective lines. 


Then 
d'ivf(x*) = d' (Ax*+b) = 0 
and 
yt (Xe) = CG) a 
so that 
eee ck") = 0. (223) 
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Equation (2-1) shows that the direction (x*-x!') is conju- 
mate to theworiginalwsearch direction. [It iswthis rea- 
Sening that Powell wsed to develop his) method [13]. 

As with most methods involving search directions this 
procedure is begun with a choice of n linearly independent 
searchedirections disseesd-. The function is then minimized 
along each direction in succession, with x” being the spews 
resulting from the minimization along a the last search 
mrection. Prom this the direction aes is obtained 


where x° 


us thiemamitial point. This direction is wsed “as 
aaother search direction along which to minimize. The re- 
elt of this i1Ssa new initial point from which to begin 

another round of searches along n linearly independent di- 


+5 'T 1 + a F . . + 
mecticns. The bast n-1 directions are retained but advanced 


in index by one as follows: 


d’ = dead j=) eee 
Tt 20 


d= = x =x 
n 


There is a possibility that no progress might be made 
mreong a certain search direction during any given cycle. 
For example, assume that a cycle is begun at a point which 
minimizes the function along dt, Therefore no further pro- 
gress is made along this direction which means that (oe 
will have no component along He Bie enen at 1s deleted 
arom the next round of searches which implies that the n 


search directions will not span the given space. If this 
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problem does not arise then the method will generate n con- 
fepace dinmections iter n stages. The next search stage 
would therefore produce the desired minimum regardless of 
waere the initial point was located. Of course, all this 
was done under the assumption that the function was a 
m@eative dé€rfinite quadratic. Since, in general, this is 
mre in fact the case then the process must be applied 
iecratively. 

imeeproblem of loss of linear independence aW< a 
serious one, though, and cannot be overlooked. If this 
problem arose the method could not converge no matter how 
long the computer worked on it. Since conjugate directrens 
maeem to offer significant advantages in the minimization 
meeowen a refinement cf the above method was sought te 
eliminate its flaws. Just such a method was suggested by 


Powell in 1964. 


oes Mev ioeED CONJUGATE DIRECT EONS BY POWELL [2] 
Medi at Ses assumed that the method 1s begun wit 


1 


; ; 1 
meme directions d,,...,d 


a these and each subsequentssee 


are to be linearly independent and scaled such that: 


d'Ad: ele, for =e 


mec det D = ete (das ode). It will now be shown that this 
@eterminant 1S maximized when the d's are mutually conju- 
merce. ~ Let oe ie="),...4m, be a set Of ni eon) Ueate  -1onzene 


scaled vectors. Since they are conjugate and nonzero they 


oe 


- 
a \ aaa == > 
a 
= a a 
eo —_ 


Ce  ? oe 





Bust be linearly independent. This implies that each ae 


can be written as a linear combination as follows: 


i. On k 
a = Ke] Eee 
ox 
1 a n k n k 
(d°..-d') = (,2) Uizpv. “e21 Une’ )- 
Therefore 
ge) = |@e...v wily (2-2) 
eee nN Mes: a Pp 
= Ady = C21 aoe ) ACEy uae ) 
n > m! 
m=l1 p=l1 “jmUkp” Av 
n t m 
= mbt “ymYKmY AY 
since 
t 
v" AvP = 0 for m # p. 
Therefore | 
attagt =jl= é Ue (2-3) 
Tae keL “JkY 5K 


But equation (2-3) shows that the determinant of U can not 
Peeeced one, which it equals only if U is an orthogonal ma- 


ome. if tis is the case: 


Megha! sus ee (2-4) 
) k m= J . 


i Ym km 


equeetion (2-4) implies therefore that the directions 


ql l, 


1 3 e e > 


1 ; 
-,d gee mutually conjucate.  TPhererore cimce sagem: 


mere chosen arbitrarily it can be seen from (2-2) that det 


< 
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D is maximized when the determinant of U 1S maximized, which 
implied that the given directions were conjugate. This re- 
fault then forms the basis for this new method devised by 
rewe ll . 

For the most part this new method resembles the first 
memenod by Powell: after the n search directions are usedy 
it is desired to look at the direction x”-x°. Again this 
ection 1S examined since it appears to be along this 
@irection that the process is progressing toward the mini- 
gam. Unlike the earlier method, though, this new direction 
iomnet automatically accepted as a new search vector. It 
foe first be determined whether replacing one of the vec- 
tors in D by this new vector would increase det D. If it 
mummperecrcasca then it can De reasoned that the €trectiem: 
must be approaching conjugacy. Obviously the det D would 
not increase if the replacement made det D = 0. This would 
be the case if the new direction was not linearly indepen- 
Gent. Thus by using the new direction only when it in- 
creases det D insures that the linear dependence problem of 
Mme Carlier method is eliminated. The question then arises 
[mero which of the old directions should be deletedewhen 
phe new Ameeeion is added. 


ie we assume that the vectors are Scaled Such tiae 


meas = 1 then 
- 
aeat (xt-ata®) tacxt-atay + b'(xt-atay) + € 
f(xt) = 4(x!j) Ax? + b'x. + C 


1 


Zo 





£(x*7) - £(x*) = %(at)taptady - (atayAx*+atak'b) 
= (a1)? - vie(xtydy = (a4)? 
Therefore 


ie eee 1) Feo) (2-5 


Now ie -x°) = 2 eS = ud’, where 1) 1S chosen Som ema 


dXtaak = 1, 
Pp Pp 
Tee ds replaces aX Pedet Uethe sollowinmne ores Utes 
s n 
k k k ,k Klee k d k Ae ak k 
eee on - faa ake. ATaR). ak 
Sy ee ei oc eiaer k 
Sea el U a en 








a | (idetaD). 


k k 
j by dy has §¢he wea. 


Peom this it can be seen that replacing d 
meet Of multiplying the determinant by ene This multi- 
fercation factor is greatest when the largest hk” is ‘chosem 


Bet , since \~ represents the change in the value of the 


k 


i? the largest ye corresponds 


function when minimizing along d 
Seethe direction along which the function underwent the 
@reatest reduction in value. Thus the new direction should 
replace whichever direction produced the greatest reduction 
in the value of the function as long as ie If thes 


last inequality is not satisfied for some i, this substi- 


tution would in all instances reduce the value of det D which 
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memceonmtrary to the desired results. As with Powell's first 
method, this process also calls for minimization along x =x° 
starting from x° to obtain a new point from which to begin 
the next round of searches. This step, though, involves fur- 
ther problems that must be considered. 

Weane the three points x°, a and 2x™-x°® let us use a 
quadratic interpolation to obtain the minimum along the line 
joining x" and x°. Since these three points are equally 


meaced the following function can be used for this purpose: 


Ce) eae et bt tae ors) < tee 
where 
at 1)ye= ¢, = f(x) 
Nh 
ee) =i, = Ex) 
and 
SC ete Se ee ee”) - 


Solving a system of three equations in three unknowns pro- 


mmees the following results: 


0) 
iI 


(2-285 *8 oy 


DB. = aleSaoun Cuan lade 


OQ 
iI 


So: 


mre value of t, te, for which g(t) has its minimum(maximum) 


value can be found by setting qe(g(t)) equal to zero. Whicmees 


ee = -b/2a = (g1-g3)/(2(g1-2g2+23))- 


ol 





Mige Value of g at this point is: 


g at? + bt_ +c 
S S S 


Cem) / ( 8 Ceeepeaee 


meemust first be insured that Pe 1s actually a minimum of 


g and not a maximum. This condition is satisfied if 
d2 
ante GS) aCe ee ee) 
mie point X, corresponding to es 1s given as follows: 
= nh etl 30 
X x +t (x Roeles 


: ee : 2. eee Th 
Now consider the position of xX, on the line joining x and 


mm. tet ud, = (ae-x°) and BGs ae Mind Then “by (2-39) 


Me eli Te 
+A*d =x id Z(t (x £ (x 
xaPd sy Cita Ga) 


= xiao V 2(£(x") -£(x,)) 


p 


. . ; n 
The minus sign is to be used if x, is between x° and x 


Hence 


(x"-x") +d. (4 V2(£(x") -£(x,)) - V2 (£(x°) -£(x,)) = 0 
{ { — 
dSA(xM-x°) +d Sd, (+ Vea Lea) i Y 2 (f°) -£ Gaon 


But (x?-x°) = ud. therefore, 


a V2(£(x") -£(x,)) + V 2G) es Ie 
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n 
Consider the case when wis between x° and x 


+ V2c£(x™) -£(x,)) + V2(£(x") -£(x,)) 


te 
i 


V2 (£(x°) -£(x") +£(x") -£(x.)) + W2(£(x,)-£(x.))- 


Hence 


la l>| V¥2c£(x9)-£(6x% Jo al] 


which implies that; 
\x*|/Jul] < 1 for all i. 


But this last result violates one of the conditions that 
must be satisfied before the substitution of the direction 
(xx) can be made. Therefore, when minimizing along the 
search direction co) Stare ine ne ae if the minimum 
@eeurs at a point between x" and x® the direction (x™-x°) 
mame t used in the next search cycle. If A is defined fo 
Memtie Maximum decrease in the function over any of the 
search directions used, then the above conditions can be 


Stated as follows: 


If either g3>g, and/or (g,-2g2+g83)(gi-g2-4)” > %A(1-8s)? 
then the same search directions should be used again and a 
emould be used as the new initial point. Otherwise X should 
be used as the new initial point and Ga) should bewsub- 
[ercuted for that direction along which the function de- 
creased the most during the last search cycle. 


Unfortunately this new modification eliminates a use- 


ful property of the earlier method. The previous method 
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weuld, form@™positive definite quadratic, generate n con- 
jugate directions after n search cycles and thus the mini- 
mum would be achieved on the next cycle. The modification 
Backs this quality because of the manner in which new search 
Garections are generated. There is no guarantee that a 
newly generated direction that is used to replace another 
one might not itself be eliminated later in the process. 
jms the property of convergence after n+l searches will 
most likely be lost. This, though, may not be as serious 
meproblem as it seems for most functions to be dealt with 
will not be quadratic anyway. 

meemiegesicd Criterion for convergence is noateer whether 
mee function has decreased in value significantly over a 
~eEmimG,Clc omeertit»§ gis bes boom stated pxucvicusly , fommec:— 
mmm functions this could produce a premature halt to the 
mmeecedure. Powell has suggested that the following cri- 
Merion be used [13]. 

i @entiaue the iterative process until Cie echamaenin 
eemenr Variable over one cycle is less than one tenth the re- 
meee decuracy. Let the resulting point in the last cycle 
ber a . 

2. tImere@se each variable by ten times the required 
accuracy and repeat step one, producing the point bD. 

3. Minimize along the line joining a and b to obtain 
Mae point c. Stop the process if the components of (a-c) 
ea (b-c) are all less than one tenth ane required accu- 


macy . 
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fe Othemwvise replace as by (a-c) and start step one 
again. 

rt appemws as if this convemgence criterion is %@ Wemy 
Serict one. Since this method requires a large number of 
functional evaluations, a convergence criterion which is 
too strict could cause the computer to do a great deal more 
meork thanepsenecessary. Of course, the nature of the prob- 
lem will dictate the amount of accuracy required, which 
Merl ultimately affect the proper choice of convergence 
merecrion. But in all cases there must be some sacrifice 
moyacclracy made to avoid too many evaluations of the func- 
ion. 

The following method was designed in an attempt to 
rece. 


meleyiate ancther ovroblem that arises in Powell's 


i 


mere. the requirements that must be satisfied before a2 
new direction can be defined are much too demanding for 
problems involving a large number of variables. The re- 
eet 1s that frequently one set of directions is used over 
and over again which, as is readily apparent, is similar 
@e the altermating variable method discussed previously. 
It is therefore desirable to reduce these requirements if 


Mae main characteristics of theemethod can be maintained. 


C. ZANGWILL'S METHOD 
Zangwill proposed the following revision of Powell's 
method in hopes of increasing what might be a slow rate 


of convergence in the former method [6]. The procedure 
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involves the use of two different sets of search directions. 
The set Cs, 1 = f,...,N, LS tive? co-o@dinate divecfions., 
normalized so that Ic, | = 1. These directions will remain 
unchanged throughout the entire procedure. The other set 
a), ie 1 Lc eneewhiere jad | = 1, contains n linearly inde- 
pendent directions. These last vectors are used much in 
eac same wmenner as the search directions employed in 
Femell’'s method. Thus wt is this set that will change 
after each search cycle. The process is begun from the 
matial point ae Then AD is calculated to minimize 

ax. + Anda) and Xd is defined as the point eh ended 

mre valwe of t initially is set equal to one. The pro- 
cedure then becomes iterative. Thus for the first itera- 
inom t, the point er: and the directions ds, i] ee. 24 


mee all known. In general, for the kth iteration assume 


mat t, the point os 


Pe: and the directions ce 1 =) eee cones 


mre given. The kth iteration proceeds as follows: 


Kea 


om) “Find a to minimize f ( ey 


#: ac; ). Update t so 


mime t is réplaced by ttl, 1f 1< t <n, and t is Weplaced 


_ koa sgimmake |. - 
mel, if t =n. ice 7 (OMe Xeey + oc. If a = 0 
mepeat step (1).. If step (i) is repeated n times in suc- 


cession then no progress has been made when searching over 
the n co-ordinate directions. This will happen only when 
the minimum has been reached and ne the preeess sheuld be 
malted. In this» case» the poimt at which the function 7s a 


° oe °° k-l 
minimum is x ; 
n+l 
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rr) EFoumr = 1,...,n, Cateulate Mm to minimize £(x* 


i-l 
k kK . Kk k kK k 4 k &-1 
Fiche os :) and set cae, * Ne id; eta Rie = ix ee) 
si k-1 k | ane qk 
rex mere) Pe Calculate A/,, to minimize f(x,+)_,,4°,,) 
k Pe k k k io eee a 
and set ae Hat Aner mel’ Set d. = dead for 1 =si% 
rl. 


Now go to the k+l iteration. 
It can readily be seen that step (11) 1s very similar to 
Powell's method which was discussed earlier. 

Zangwill has proven the following important theorem 
concerning the above method. 

THEOREM: Let f be a quadratic function with a positive 


definite Hessian A. The above procedure stops at an opti- 


Mmemepoint in step (i) of iteration k where k <n. (Recall 
weer Pomell'ce method ceuld gwuanantee this type of cconmer- 


mamee Only if the linear independence of the search direc- 
Groen 1S maintained.) 

PROOF: The proof will be by induction. Assume that at 
the beginning of the kth iteration the method has generated 


memutually conjugate directions, ae oe The way in 


eer 


femeen this is done has been discussed earlier in this paper. 
Assume that the procedure does not stop during step (1). 


Therefore, a new point has been generated, which implies 


k-1l ko : k _ _k-l 
@ieat x ey 7 oS Since me > Xt oS where a # 0 and 
Ka 


ney 7 ee +) then ee a Gane 


k 
Also since x, was generated by n minimizing searches begin- 


was chosen to minimize f(x 


ning from xk then £(x*) > fom) Therefore, £(x*) > £ (x8) 


Kee I Kew 


> £(x\ =) which implies that - f ea And thus (X_,j 


S/ 





K ak were 
ike ee aT, 


mised as the bast k siearch directions since ak-1 = ak. 


int all it 
K-l and x* were found by minimizing 
Be lt n 


- x*) #0. During the k-1 iteration d 


ieemetore the points x 


in the k dimensional space spanned by as aX. ieee 


i-khles 
k = xKol 
etalk Hal 


Jay Therefore by whic 


k ; 
fore, as was shown previously, d a #7 0 is con- 


vd 


. k 
yugeate to the directions a ane ee 
nth search Gvclle nn Con) ugate Searen Girecticogssiavce been 
memerated. 

mes remains to be shown that n Conyjweate vectors arc 


merearly independent. Assume that they are not independent, 


which implies: 


de =e). 1G ae 
Ae ie 
k k k 
ae = Be Ges. — 70% 
J J 17#jJ 11 J 
mmersince A was assumed to be positive definite, gE tAdS>0, 


merch 1S a contradiction. Thus the assumption that the 
Meretors are linearly dependent must be false. Since the 
above argument holds for k = 1 the induction proof is com- 
fee Ce . 

Hiviously simce most functions of anterest willl Tee 
be quadratic away from the minimum this method will not 
Memerate true conjugate directions. For general functions 
meris not certain that this method is more efficient than 
the other methods presented earlier, when away from the 
feimun. However in the neighborhood of the minimum this 
latest method promises to be the most favorable thus far 


available, of those using only function evaluations. 
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Vo seeGBBREST DESCENT AND NEWEON'S METHOD 


The method of Steepest Descent and Newton's method 
have been placed together in this chapter, though the 
theory and procedures of these methods are quite dissimi- 
Tar. The Steepest Descent method uses only information 
about the first partial derivatives while Newton's method 
mequires knowledge of the second partial derivatives. 

They have been combined in this chapter because they 
are the two classical approaches to the minimization prob- 
lem that remain useful today. They differ so much from 
the more recent methods that they deserve to be in a classi- 
mmeation of their own. Many of the more recent methods have 


‘been devised as improvements of these two. 


A. STEEPEST DESCENT BY CAUCHY 

This method was one of the first techniques suggested 
to solve the problem under consideration. The main prin- 
mile involved here is the use of -Vf(x} as the search di- 
feet10n from x. This selection seems natural since a 
search in this direction from the point x ineitees that weve 
function will at. least initially decrease most rapidly. 
When far from the minimum this direction seems to be the 
most useful for it offers the opportunity to approach the 
minimum in one step rather than having to use n search di- 
rections as in the methods discussed previously. Unfortu- 
nately, as the minimum is approached this direction tends 


to be less and less useful. 


Oe: 





Omew reason for this as that round off"errors and Gn- 
accuracy in determining the gradient can have a great ef- 
meek upon the search directions. This is demonstrated by 


the following diagram. 





From the above acne can be seen that the direcemone tage 
should be used and the one that is actually used might 
be almost perpendicular. Another problem for some func- 
tions is that this method may generate directions that 
@mise the search to oscillate about the principle axis of 
the function with very little progress made in each search. 
Problems such as these significantly reduce the effective- 
ness of this method. 

This method can be used in either of two ways, a 
fixed step size or by minimizing along the Searcime ieee. 
Meo. the latter technique requires minimization along 
mer(x). The fixed step method sets the distance which ds 
iaveléd along the search direction. The function is 
evaluated at this new point and this value is compared 
mem tae value of the function at the previous point. “AS 


long as the function is decreased the new point replaces 


40 





be premmems one and the gradient is evaluated at this 
point to determine the next search direction. If one of 
these steps fails to reduce the value of the function 
then the step size is reduced and the process continued. 
Convergence is assumed to occur when the step size is 
reduced below a specified limit. 

Both of these methods have their relative advantages. 
The fixed step technique requires fewer function evalua- 
tions while the minimization process should converge more 
mapadiy. Unfortunately, though, neither method will, in 
general, proceed very rapidly when close to the minimum. 
mmtoS7, Booth suggested that the point nine tenths the 
distance to the minimum along the search direction should 
be used instead of the actual minimum. The pumpose wt thas 
is to attempt to reduce the oscillation about the principle 
ees Which is typical of this classical method. This sim- 
ple procedure does reduce the problem but not enough to 


make the whole procedure useful for general functions [3]. 


De NEWTON'S METHOD [3,9] 

Sometimes the Hessian matrix may be known for the 
function to be minimized. Since, as has been suggested, 
meemay be useful to take full advantage of all the infor- 
Merion Obtained about the function, it may be wise to 
search for a method which incorporates this second partial 
Per ivetive information. Newton's method is just such a 


technique. 
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byes nemamsecond order Taylor expansion~for a quai 


metic function the following equation is produced: 


aco) = ae expe a Pee! ee. Pe 
(x) ae ex ) a Ec) j=l j aX, = 2521 k=l j k 
2 £ 
oe xX 


miiere X is the point at which f is a minimum. But, 


n 2 
bxv 7 |Sxc |< * 521 y[axczq [ot lee om 
1 4 | X J J jo) Lee 
At the minimum (af/ax,) = 0 and therefore, 
ie iar ade 
ane. = j21 Me faze a eos i) go 5 evi Cs. 
+ L 39 1j* 


= = 2 1 5 
Let of/ax; g5 and Gx 3 Ey nenOnS Then equation (3-1) 


implies 
g =Ghorh=Gg. 
iierefore 
xX =x -G °g, since x = xX +h. 


igieeset Sreeaquations (3-1) must be solved to yield alo 
umenis the gradient at the current point must be known 
meaethne Matrix of second partial derivatives must be avail- ’ 
able and evaluated at the minimum, x. This last require- 
memt poses some problems because, in general, the actual 
Minamiun must be known before this can be done. But 2£ this 


Pommeeis Known then there 1s no problem. 
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Newton used the preceding arguments to devise an itera- 
tive method to solve the minimization problem. When the 
Search has reached the neighborhood of the minimum the ma- 
Geex Gis evaluated at the current point rather than the 
actual minimum. If in the neighborhood of the minimum the 
Pemct1On approximates a quadratic, the matrix G tends to- 
ward a constant matrix and thus evaluating G at the current 
point should give a reasonable approximation to G when 
evaluated at the minimun. 

Newton's method has been shown to be a very useful and 
Bewerful minimization technique. But like all techniques 
meoecs have its limitations. For example, progress to- 
ward the minimum is assured only if G is positive definite 
mee tue method mayeectually diverge for general functions. 
Another problem is the time required to generate G and G. 
and the storage space needed for these matrices. Since the 
faerix G is only used as an approximation, the time problem 
can be somewhat reduced. This can be done by calculating 
G and G omiyeatter eaich k iterations rather than for “eae 
new step. Some of these problems are dealt with in the 


following method. 


Ce MODIFIED NEWTON'S METHOD [6] 

The method now to be presented is a further refinement 
of the Newton's method that was just discussed. It was 
specifically designed to alleviate certain problems that 


the original Newton's method could not handle. One such 
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Aeabien Gham wiliebe dealt with 1S the occurrence of a 
Hessian matrix that might not be positive definite. 

Pas method requires the selection of ie to minimize 
£(x*+a*d,). As such this process is very similar to most 
of the methods discussed thus far; the key, though, is 
imethe selection of d., the search direction. This is 
done according to the following rules: 

1. If H,, the current approximation to A’, has a 
Mmesative eigenvalue then d. Should be chosen to satisfy 


tme following: 
(ede <sOmana di Vr < Qi. (32) 
ie 1 ; 


[ie lerhe eigenvalues ot HH are nonnegative then 


choose cd. Sic it hia tae 2 t hiees. 
Heade, = 0 Jaa ve <a (3-3) 
‘lie 1 
Or 
Hid, = -Vf. (3-4) 


Consider the first situation, 


(i) a£(xt+A*d.)/dd, = d!v£ < 0 at A* = 0 


(ii) Bere aed: )1/ oa cee SOc ene 

New (i) implies that f, at least initially, decreases in the 
Search direction. If (ii) remains valid as ee then, ob- 
mmensiy, the function is.ever decreasing and thus must ap- 


proach -~, But if this is the case then the minimum has 


been found. Otherwise there must be some place along Cs 
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where H. becomes positive defimite or semidefanite. nee 
mms area is reached then rule number two can be applied 
until such a time as another area is reached where Hs has 
a negative eigenvalue. 

Now consider the second situation. A similar argu- 
ment holds. In this case, d?(£(x*+A4d,) YAr?=d!H,d,=d!0=0, 
which also implies that unless an area of positive defi- 
Miteness or semidefiniteness is reached along d. the value 
Sethe function will again go to -~, Equation (3-3) ob= 
mlously defines just the search direction given in the 
section of Newton's method. 

iInpmmueetice then £his method employs (fie neue Newton 
meorch direction when in a region in which the Hessian 1s 
Meeeecaye dctinive. In oth@r regi@nms the iWethcd scllcetgs mex, 
G@arections which should take the search process into an 
area where A is positive definite. 

In general, therefore, this modification should im- 
prove the behavior of Newton's method when away from the 
ioeemmun. Lt should also be able to solve a more general 
class of problems than the classical Newton's method. 
Umrortunately, however, it does not completely remove all 
the problems arising in the use of Newton's method. The 
most significant disadvantage of these latest two methods 
@s that both require a great deal of information concern- 
Mmerthe tunetion. Por some functions it could be just sds 
time consuming to compute second Deere atone erence as it 


1s to solve the problem by some other procedure. Also 
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difficulties arise in problems of large dimension. Invert- 
ing an n x n matrix requires a great deal of work if it 
can be done at all. For these reasons other methods have 


been developed to approximate H = ine 
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Vo VARPABLE-MET RIG METHODS 


Mirseis theslast major class@rication of methods to 
be discussed in this paper. As such it represents some 
of the most recent developments in the field of function 
minimization. The theory behind these methods is rather 
morale in concept. It involves making better and better 
eeproximations to the matrix H = ree where the function 
to be minimized is assumed to be approximated by a quad- 
meerc function f of the form: f(x) = See) 20+ Dx er 
If £ were actually quadratic then knowledge of A~" would 
allow the minimum to be reached in one step, as was shown 


Beyeecaquation (1-1), Chapter I. Thus any method which can 


Pee we ere T= a7 = o~ uaF oie eie-nk-< ial 
@em@rate this matrix would indeed be valuabi 


0 


A. DAVEDONS FLETCHER, AND POWELL, [433 

Tire original work in this area was presemted by 
Davidon in 1959, but Fletcher and Powell took Davidon's 
@eroinal method and improved upon it to the extent that 
theirs has. become one of the more popular and reliable 
methods available for minimization. Though most of the 
Original work was done by Davidon, the notation and argu- 
ments by Fletcher and Powell are more concise and will be 
used in the discussion to follow. 

Let g. = V£(x') and d, = -H,g, where H, is the ith 
approximation to A’'. Thé matrix A is assumed to be post 


tive definite and Ho the first estimate of H, is usually 
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Seseected as the identity matrix. Note that this selection 
for Hy produces an initial search direction that is simply 
mreat of the Steepest Descent method. 


Let us define the vectors 


a7; 5 os ge 
Be r d. x x 


and 
Re oes ey 

Hie vector Pj igethe ssten to Ehe minimum along d from ehe 
point xt, and q; is the corresponding change in the gradient. 

FOr thisemethodsit 1s desiredwito repeatedl, Updieesenc 
matrix H. to make better and better approximations to A 
Consider a recursion formula which generates H.'s with the 
momlowing properties. The set of vectors Po»s+-++ Py, are 


iMmearly independent and they are eigenvectors of H A 


Kad 
With one as eigenvalues. Then, obviously, HOA will have n 
linearly independent eigenvectors with eigenvalue one. But 
-1 , 
wes Can occur only if HUA = I and thus HL = A as desired. 


It will be established that the following recursion formu- 


la satisfies these requirements. 


Pepe eee te cee ance 
iN Te (4-1) 


ee ae qjHi dy 


First let us show that P; 1S an eigenvector of H. 441A with 


emeenvalue one. To do this consider: 


45 7 8i4+1 7 Sj 


ee pea) 
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1+] ii 


= Ax = 
= Ap, . 
mence 
Heian ye 13 
Bre . PaPida (Hq; ) CaiH.q,) ant 
a9; * “pia,” (aA, t«‘ 
ie es ee 
i Paws Say 
Der (a 


Pinally, if the following two results can be established 


fae thee desired properties of p., i=1,...,n, will be 
1 
established. 
meee = 0 (eae ejects (4-3) 
H Ap; 220s aes (4-3) 


Equation (4-3) implies conjugacy which has been shown to 
require linear independence, and equation (4-4) is the de- 
Sired result concerning eigenvalues and eigenvectors. Equa- 
Cions (4-3) and (4-4) will be established by induction. 


memolder equation (4-4) with k = l, 


H, AP) = Po 
my (4-2). Consider (4-3) with k = 2, 
preeere, = (PD, AD.) GA oe ee 
OT 
eee = = ee oem, (4-6) 
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Sance We Minimize aliong p 


Now assume that 


and 


Gons1daer: 


Hence, we see 


DEEe , 


and hence 


Petge = 0, 0< i<j <k, 


H; Ap; = 


1 


that 


Bae (ie 6a. < ke Cer) 


it] 


Os ee Pe 4 4b) 


Bia Pe oo Pe 


— "Pi, 8451 * Pa Sea ee ee 


Dp; cee Dyametcam 


On 


Pee ae eee 


{ 
Py AMS, 


p; 'A(-d,) 


k 
“p; 'A(p,/h es 
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But this is equivalent to the following: 


ee P SOO eee te IM eared L (4-8) 
Also, 
P_P, ‘AP: (H,q,.) (a, "Hy Ap: ) 
Hy. AP; = HAP; + Keak gag lh i. koe 
Pi 4k Gk PKI 
P,P, AP; CH...) Cay "HL Ap; ) 
ee Se 
i gh tee 
Oa 
em by (4-8), 
Eee Diy a0 oS docile (4-9) 


maeemacvons (4-5), (4-6), (4-8), amd (4-9) the induction 
meoor has been completed. Thus He = i as was desired. 
An obvious question at this point is what motivated 


mmenmchoice of the recursion formula? Consider the follow- 





ing: 
Let di,.-.,d, be mutually conjugate dpPreet Loans: 
ee Th) 
2 ; 1 
wi eater os ee en te d_'Ad. 
imerefore, 
<> a 
Jo1= 1 dee 


Nh a) j i ey 
i241 (rA~) did. [(r7) d. Ad. 


all 





ra 
{ ’ 
pee ges 


Tl 
' ' 
i2] Ee /P; ey 


Thus we see that the second term in the recursion formula 
was selected to make the approximation approach on 

ft will now be shown that the third term on the right 
side of equation (4-1) is added as a correction factor. 
As was shown previously it was necessary that H. 4 AP; = p. 


1 
to make this method valid. Consider the following: 


Z ' ' 
eee ekg PA 44 t Sy: 


It will now be determined what form Cc; must takemwin- Orage 


Ge satisfy the condition that He VAP; = p,. 


Dai eB aesl 


{ t 
Hep Dae Ds eae cr 


H. Ap; 1 aa Ci Ap, . 
memce , 


H, Ap, = -C Ap. 
or 


H.q, = -C.q.. 


Aesolution for C. to this Geqmation as. 


= = . ' ' e 
oF Hass 2 fz Vi 
mimere zis an arbitrary vector that is not perpendicular to 


sh 


a2 





hom this choice of C.: 
=H Guae qs 


1 
{ 
4 


-= -H. a. as desired. 
cle sired 


ja! 
N iF 


Pot since it is desired that C. DeESSyMNMeEEYri Cc, zis see 
equal to H.q.. 


Therefore, 


C. - “Hq. (q; H.)/(q; H.q.), which is as 
desired. 


Now consider the search direction d.. 


ieeit can be shown that g,'H.g. 1S positive then 1t Ys 
‘evident that the search direction is always in the direc- 
tion of decreasing function values and thus the Revs can 
Memwenosen positive. But if H. can be shown €0 be poGutige 


definite then pee lee > 0 as desired. Since H, as ch@eem 


meee positive definite it remains to be shown by induction 


meoument that by (4-1) if H. 1s positive definite then so 


1s Heads Consider 
Xe aX x “Hm : te 
nia x = Xvhex + oe... ie 
i : PE Ky 95 FG 9G 


' = ' 
(x H.x) (q,; H.q;) (x H.q;) (aq; Hx) 
qe 


' 2 
(x'p5) 
Py 4G 


5 








' ! 
But (x Hx) (q; H.q,) 


iV 


(x'H.q,)(q,'H.x), by Ste#hwartz "S™in- 


equality. Therefore 


ra 
oo 
mo 
iV 


2 ! 
(x'q.)°/p;'q., 


With equality only if x and q,; are parallel. Obviously, 
(x'"q;)? is greater than or equal to zero so it remains to 
be shown that P;'4; ~ 0 for their qwotient to be mreater 


ran zero. 
P; ‘4; ae P,' (854) = g;) 
— ideale 
p,;'g,, by minimization along qd. 


= - 1 ' 
Aer 


= 2°g,'Hi8, > 0, 


Since H. was assumed to be positive definite. 
Hence, X'H. 44% > O° for “all nontrivial xX; tis mip rere. Head 
1S positive definite, and thus the proof is complete. 

By proving that H, 1S POSitivesderinite fon elie 
meenas been assured that for a quadratic function that this 
imetnod is completely stable and will produce the minimum in 
@e most n steps. The last part of -thagesection sconce nmin: 
mime positive definiteness was specifically due to Fletcher 
and Powell. 

The ease with which this method can be applied and its 
strong stability make it one of the most useful methods thus 


mar duscussed. As with all methods it Will only be approxi- 


mate for functions which are more general than the quadratic 
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humetion disteuwssed above; but it wowld be natural to as- 
Sume that it still would usually out perform the other 
ftechnaques.. 

Of course, there are some problems present in this 
method also. As with most, Davidon's technique requires 
determining the minimum along a given direction. While 
this may or may not be a major problem it will require 
additional function evaluations which must be considered. 
faeso there could be a storage problem, especially for 
meoge dimensional problems, since at each step ann xX n 
fomerix must be saved. It is the first of these problems 
meat the next method attempts to avoid. It is not unusual 
Mer the necessary function and gradient evaluations to use 
meee Oot the total computer time required for the solutvon 
Seeche problem. Thus, if the number of evaluations is 
reduced, without altering the basic method itself, it would 
be expected. that the result would be a more efficient 


method. 


Er MURTAGH AND SARGENT 

This method, devised by Murtagh and Sargent [10], uses 
a recursion formula similar to that employed by Davidon, 
mietcher, and Powell. The recursion formula to be is used in 


-1l , 
memerating A 1s 


eel = Hy a (p,.-Hay) (py, Hay) ca (p,.-Ha,) , (4-10) 


a 





As will be shown, the advantage of this method is that it 
@oes Not require minimization along each search direction. 
This new formula ar be developed as follows. 

Beteus assume as before that the function is appmexi— 


Mamed by a quadratic, f(x} = x'Ax/2 + b'x + c. Then, 
g (X,) = Ax, + b 


Ae A 


and 
ee eee 

or 

8, = APK 
where 

S. = 0%) - OY 
and 

Pe *k 7 *k-1? 
femeeetained earlier. 
Mew tet H be an approximation to A’',. I£ H were exact 


ene n Hq, oe But since H 1S M@& G€xact there 15 am camen 
involved. Let e be this error, e = Py Hq,» and consider 


now adding AH to H so that (H+ AH) qy = Then, 


Pye: 


eee 2k 


AHq, Say - AHq,. = Gee 
If AH is chosen so that each column is a multiple of e then 
AHq, is also a@ multiple of e. Since it is désirablestor i 


to remain symmetric the following choice for AH is made: 
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AH = mee! 


an which m is a constant to be determined. We require 


mee 'q, = Coie 1 ce ; 


= 
" 


ife'q,. = 1/(p,-Ha,) "dy 
and 


AH 


(Py -Ha,) (py -Hay) '/ (py - Hay) '4,- 


This produces the following recursion formula for H. 
Head 7 Hy (Pp. -Ha,) (py, -Ha,) '/a,' (py -Ha,) - eB) 


Inte should be recalied at this» time that for a quadratic 


mametion the step to the minimum is given as follows: 


a eal 
x ae -A g(x) - (4-12) 


meesiMilar search direction will be employed in this method 


meen the addition of an arbitrary scalar, dp» SO lane 


De ei eee fe 


The scalar is added to this formula because the step in 
(4-12) may at times provide a poor estimate of the distance 
to the minimun. 

The advantage of the recursion formula developed in 
(4-11) is that it is not required to minimize the function 
along each search direction. As was stated previously, 
this requirement for minimization was a disadvantage of 
Davidon's method. Unfortunately, though’ this alteration 


does reduce the stability. Murtagh and Sargent prove the 
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following theorem concerning the convergence of their 
method [10]. 
THEOREM: Assume™that the function f(x) is defined on UCE™ 
and is such that 

im £(X)) 1S Comtinueus on 2 = {x|xeU; £(x) < c) amg 


2 is closed and bounded. 


ii. £(x) has continuous second derivatives on Q' 


ieajeceU; £(x) < cheand there is a A such that | |H(x)]| 


A 
ot 


Bees’. 
Bearting at any point x,e2"' with g(x,) 7 0, we generate a 


SEQUENCE Xo yXy5+-- Xp rXpyqores from 


Men if the matrices 4, satisfy the conditions: 
elle ll < THe, tl < olla, 


| 21 'H, 8} | Z §{ lez. | Hg. || 
mmere p, o, and 6 are fixed positive constants, it 1s” always 
meesible to choose a finite nonzéro ie at each step such 


mat: 


f(x = Gs > a0 


k? kt1) 2 © 84 FY By 
mirth ¢€ a fixed positive constant less than unity. With 
a, SO chosen, the sequence (X,) lies. in @ and "tendsmco 


me = {x|xeR'; g(x) = 0} in the sense that the distance 


d(x, ,2*) of Xp from 2* tends to zero as k>o. 
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Murtagh and Sargent's method satisfies the conditions 
of the above theorem if Hy 1SSpesmerverdefinite tor alap kr. 
A theorem by Resco (1967} cam be used to establish 
conditions under which Hy will be positive definite. 


Let 


zy = fie Hyd and Cy = Gy 24° 


Define 


(aa Hy + tz, 2, '/cy : (4-14) 


fea caratheodory's theorem H(t) 1s positive definite in the 
mamce 0 < t <1 if H,.1 is positive definite and H(t) is 
Monsingular over this range. Since i ate 2,2, '/ Cy; 
to show that Hy 1S positive definite by assuming that Ay iy 
Pr mesitive definite it 1S suffreient to shew thar) eee 


Memsingular for 0 < t < 1. From (4-14), 


det H(t) 


tur 2 
det eer (Cea tz, Hy 24 / Ep) 


det H 


key CL + ta Ay RAR. 14) 1%) 


ry} - 1 
det Hy, (1 + tz, 'H, yp, /cy, - t2,'q,/cy) 


ry = o 
det Ayy (1-t+(tz, Hy, 6 ay Hy 8p - 3) )) 


ak 


il 


det H, 4 (1-t-a, _,t2,'8,_4/o)) - 


It is necessary that 


(4-15) 


ct 
1A 
}—t 


(a c.. Oy t2p' Sp 4/) aD 60 
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OT Hy tO be pesibive definite and nonsimgudiar. Samee 


Op] (iompOsttive,.1t 15 necessary that 
21'S, _4/ Cp ane (4-16) 


Since a positive definite matrix (usually I) can be chosen 
Bor t,, equation (4-16) is a necessary condition for Hy to 
mesa positive definite matrix for all k. Consider the 


numerator and denominator of (4-16) separately. 


een CBee ete) sere 


H 


Se eee 1 eee eee 


Cy By My C8 8-1 IB) Ske) 


= {l-wn co tT rei ait cl ral (Asal 
(Veo) 8p poke Se keke ee 


Cy = 2,14, 


= Z' (8,-8.3) 


= Cee eed Sie ee 
By 24 BK-1 
an ne ' = t 
= 8g 8 - Te -1 8k 8k-1 MM k-18k 
~ 24 8k-1 
(4-18) 


= (roy) 8g F188 MK 18k 2k Bk-1 


solving for Ay 4 in €quation (4-17) “and subststutcing seme 


into (4-18) produces the following result 
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t me t 1 . ' 
ye ee - 1 oe ee eee By-18eSe-1 ee? 


H 


= ! ! = ! 2 
(By My 1 8x 8q-1 K-18 K-17 (BK 184) 7) - 


(4-19) 


Since Hy 4 iSeposlel ve Cdetinite , Bpy Ay 8e-] > 0, andere 
meen Of the quantity on the left side of equation (4-19) 
depends upon the sign of Cy. Schwartz's inequality shows 


meat : 


7 (By Hy By By g Wy 8-77 CB 1 My. 18,)°) 2 0- (4-20) 


Now assume that 21° 8p-7 > 0 and examine (4-17), 


Ze 1841 aa (L-o, 3) 8,27 Hy 8h] : een S S00 
If0<a,, < 1, 
re Sd eee 7 Sk “eek 1 


and hence 


reset 1 ke eee 
et M1. > ile 
Wek kK io ie 
whence 
oe kegs: 
But Hy, is pOSitive definite and therefore Sy Hy 8,178 


Therefore, 
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ey Beye 7 8 8 


it a pole 


hea 
(l-a, 4) <r 
Pence , 


ie et 9-1 2k-1 7 OR ge e-1 


which implies that, 


eee 


Again, 


FT 


eet ere 


Senich implies, 


H 


ee) tte Sie Syete ee 


voll = 5 tty - 1 
mierefore eeee > 0 implies that ey Uae eae See 
which by (4-15) implies, 


H (aca 


' ; 7 
eee Cet eden ea eee 


mee (4-20) and (4-21) then cr < 0. Therefore if Ze Bea 2 0 
then 24'S. 4/Cy < 0 which is-as required by (4-16). Seemn- 
ilarly Cy. > 0 implies that Zy' 81] < 0, which cnee acct, 
meisties condition (4-16). Unfortunately, though, 

21'S, 4 = 0 and Cy < 0 can ecewr simudtameowsly. Them come 


eecron (4-16) iseviolated and H, will notebe posatayesdeti- 


k 
nate. Since it is necessary that this property is maimtained 


this can cause some serious problems. Consider the results 
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of taking a step with a4 > l. Eqwations (4-17) anie(4=13s) 


imply: 


t oe 
1 ene 


Se eee ee eer 


ef Zy 8p] 1s positive then, obviously, Cy 1s negative and 
mie positive definite requirement is satisfied. Thus it 


might be wise first to take this step with a = 1 and 


eal 
to test 2,'g,_, > 0. If this is the case then there is 

no problem and Hy Should be updated by the given recursion 
fermmula. If this step is taken but zy 84 < 0 then cen. 
erally the function has decreased. This in itself is 
@esirable since the process has reached a "better" point. 
et Oy <0 einen le vsiowlar be beored sic, cy, > 0. ee 
mms holds then condition (4-16) again has been satisfied 
fmeenus the recursion formula should be applied. “Here, 
though, it 1s wise to add a test to ensure that Ch 1S ioe 
meo close to zero, which would contribute additional prob- 
ems, If this becomes the case or 24 ' 8p 1/ Cy > 0 thenere 
ieenecessary to start again with a new H, from the latest 
best point. Murtagh and Sargent suggest two possible 
muerces for this new H,, either I or the previous H.. The 
first choice, of course, is simply starting over again with 
no information about H. This could be useful if H, hase bic = 
gun to accumulate misinformation concerning the function. 
But, in general, it would probably be best not to destroy 


all the previous information. 
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Murtagh and Sargent offer a number of algorithms em- 
ploying the methods they devised [10]. The one that they 
found to be the most useful involved the checks discussed 
thus far and, in addition, certain checks to ensure that 
the conditions of their theorem were met. By the crite- 
rion of fewest function evaluations required this last 
method was generally found to be the most efficient. 
Again this is as has been anticipated because the need to 
minimize along the search directions was reduced. It was 
found that the conditions of the theorem of Murtagh and 
Sargent was far less restrictive then requiring actual 
minimization. 

Lietuncion evaluations was the only Cratcerlon sein 
[me could be said that genewally Murtagh and Sargent’? 
method was superior to Davidon, Powell, and Fletcher's. 
But because of all the tests that must be made the method 
must surely be more difficult to program and, outside of 
function evaluations, more time consuming to run. In ad- 
mittOn, since at times the conditions for positive defin- 


iteness can fail and a new H, must be selected, the 


0 
convergence of the method will obviously be slowed down. 

smould this resetting of H be required too often there is 
no doubt that all advantages this method might have would 


ipe lost. 


c. PEARSON 1S CEASS OF SVARTABLE MEDRIGeMEtTions 
In this section will be presented a class of related 


methods devised by Pearson [11]. Umeluded in this Cia=aeae 
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twee method invented by Davidon. Though the recursion for- 
mulas of these methods are different, their development is 
closely related as i be shown in the following. 

In the previous section it was shown that for a quad- 
ratic function ten = Ap, where Slee ea ae ee and Pi > See 
Taerefore , Hq, = Pp where H = A °. Consider the following 
possibility. Assume BG Sets for i-= 1,...,) 4 where A, 
is the jth approximation to ee wales Me can be updated so 
ema t Hag = Da» ror 1 = 1 A. fwd 1f the set {p; 5 i = ie 
meen} is finearly independent then ae = A’, This can be 


seen from the following: 


Assume, 


ge was 


mei = 1,...,n therefore, 
(HA) PR; See 

mor i = 1,...,;n and hence 
Crea Type =v, 


But the set tp. t, 1 =1,..2,n,; is linearly indepencen= 

= il ; . 
which implies that ee Now define the search direc- 
mrons as before, d. Sede 


then , 


t a t ' = — 
qd. d; dq. HH gE.» for s Lees 


Pees. - 
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, = 
Thpas.,..1f P.  &% 0, then 
qe de ="0 (4-21) 


memes = L,...,1-l. But consider what. occurs when the fumction 


is minimized along each search direction. 


'S 
J. 
> 
ie 
= 
OQ 
> 
ll 
| 


1 = 
Pj ea oa 


Pao Ds Seo £02 4. Se ee 


J J 
ie} = 1-1, then by minimization along d._,, 
Petes 
and thus 


! a — 

a | ae eee Paes 
iy < i-1 then, 

P;' (84-84-41) = ee aries 
In either case, 

Oa = (eer Ss = Iovsseue! 
implies 

' = t ai 1 _ Sates sah 

Pees eee 2 eae fome= 1. ee 

and therefore each new direction generated is conjugate to 


the previous ones. Thus if the condition (4-22) is Satisuaed 
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the method generates conjugate directions. The problem 
men is to find solutions to Hy: = p.; fOr 19 ="15 ae 


he de this definesthe following matrices: 


Q; = (q,---4;_4) 


mg 
i 


i (P,---P;_4)- 


In this notation the problem then is to find solutions to 


Gate hae (1 
mer 1 = 1,...,n. Consider, 
H, = P.(Q,'MQ,)” Q. 'M#H, (1-Q. (Q, 'M*Q.) 'Q, "M*) (4-24) 
aS . Ps (Q;"MQi)” (Q; *NQ;) ‘i HigyliQis Hig QAO MEO) 
(Q, 'M*Q,) 
= Pz 
1 


mimere M and M* are arbitrary. Thus (4-24) defines a solu- 
tion to (4-23). It was given that M and M* were arbitrary 
Mee this is not completely true since Q; MQ. #0 and 
cee iQ. 7 0. Obviously, if M and M* are chosen to be peel 
mmve definite then Q. ‘MQ. and Q, 'M*Q; are unequal to zero. 
Mmimere are two matrices that seem to be likely choices tor 
mW and M*,. First there is H which, of course, Cam andewie 
be selected to be positive definite; and then there is fm 
which is assumed WOEbDe POSItive deLiniee: 

Since nothing in (4-24) specifies otherwise, M and N* 


can be chosen independently. By doing so, four different 
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forms of (4-24) can be produced. Pearson makes use of a 
emma called the Bordered Inverse Lemma to produce the 


following recursion formulas from equation (4-24). 


Hea, 7 Hy + (p3-H39;) @,')/p3 "95 (4-25) 
Asad fe H. af (p;-H,q;)(H;95)'/q;'H.q; (4-26) 
Hey = H s p;'P,;/P;'9; - (H5q;) G1,4;)'/q;'H.4q.. (4-27) 


Notice that (4-27) is exactly (4-1) which was Fletcher, Powell 
and Davidon's recursion formula. 

It can readily be seen that equations (4-25) and (4-26) 
both produce nie that will mot be symmetric.) This eee on 
Seemrse, a slight disadvantage since it will require addition- 
me Storage space, as compared with Davidon's method which 
produces symmetric matrices. . This, though, should only 
be significant in problems involving a large number of 
Variables. In general, the results so far indicate that the 
fmrec recursion formulas given above produce similar results. 
fer some functions it may be necessary to replace the cur- 
memt approximation of i with the positive definite matrix 
that was originally chosen. This happens if the approxima- 
tion becomes singular as the minimum is approached. Gener- 


pey , though, this 1s not required. 
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Vio CONCEUSIONGS 


mi the writing of this paper the author has sswudiied 
tae Literature. Of course, it has by no means covered 
every possible method by which the unconstrained minimiza- 
tion problem may be solved. But an attempt has been made 
to offer as wide a coverage as possible of the different 
techniques which are available. The methods included are 
those which have been found to be the most reliable in 
solving actual problems. Research with computers on 
specific problems have shown that no one method is guaran- 
Geocdmto out Tieneienen all others on every problem. Inegen- 
eral then, the greatest difficulty might be the actual 
Melcetion of the method to be employed. 

®o make thisedecision It 1s Gmportant to ‘consmeder all 
tees information about the function which is available. 
This includes such things as having second partial deriva- 
tives which may be computed, or the knowledge of only the 
fmadients, or only having access to the funetion waluiess 
Generally it has been cota that gradient methods are tsu- 
ally the most reliable, but this is in reference to methods 
geplied to functions for which the gradient 1s available 
hom analytic expmessiiens. On occasion, though, for some 
functions the gradient is only calcuable through numerical 
methods. When this is the case the accuracy of the entire 
method is greatly reduced. In fact, in these cases it is 
best, as a rule, to use one of the methods employing only 
function values. 
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For most functions there may be a number of methods 
meee Can be used to obtain the minimum. Thus if all that 
Poereqeired ise thesproper minadmeme then the pmoblem of 
selection may be greatly reduced. But lnepractace theme 
ieen Of course, other important factorsewhich must be 
Gomsidered. 

Gampusker time for the solution is one of these vital 
fjretors. Consider, for example, the Steepest Descent 
method. For certain functions this method may have an 
extremely slow rate of convergence. But if this technique 
leads to the correct minimum then it must be included among 
the methods from which the one method to be used is selected. 
bee to select this method in such a case would be a serious 
mistake. There may be another method which could solve the 
problem in one tenth the time. With all other factors 
equal this other method would obviously be the better choice. 

In general, though, the relative advantages of each 
Memnod are unknown for any given function. It would be 
bee to study the function to be minimized before any se~- 
Meetion of a technique is made. Jf any special Character- 
meric of the function can be identified then it may be 
possible romenee a Wiser stlectron of the method te be Used. 

The author has found the book by Box [3] and the book 


by Kowalik and Osborne [9] to be particularly helpful. 
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APPENDIX A: LINEAR SEARCH TECHNIQUES 


mn this Pech x will be presented a number of methods 
for finding the minimum of a function along a given line. 
Since many of the minimization techniques discussed in this 
Meper require one of these methods their importance cannot 


be minimized. 


i. FIBONACCI SEARCH [3,9] 
Assume that we decide to make N function evaluations 
within the interval (x,,x,) where a minimum is known to 


exist. Assume the original neighborhood is designated 


me .x }. Then define the following points: 
1°» 








i 1 1 
i Py-2(%,°X)) + X, 
PN 
Dodane): i 
si Ped Ca) 
4 Py 


where Ee is a Fibonacci number, defined by the tollewime 


yelations: 


mE (x5) > f(xy) then the minimum must lie between x; and 


1 


ahr 


i 
3 
i 
x i 


Otherwise the minimum must lie between x, and x 


Somcider (x,-x,) and (x,-x;3): 


Wee ae 
-x7 = Pie Oe 


PN 


7d 





a ee - 
ak. ee - Fez “*,°*1) = (A-1) 


Py 


= = Dos z l 
GIG E cay fala) rohan (gua / Ee) 


_ De x aoe 
oie ee: Nee 


PN Py 


Thus, 
es 
xtext = Py 1 03°%)) (A-2) 
EN 

Thus (A-1) and (A-2) show that, regardless of which interval 
Ene Minamum has been restricted to, the length of the mmter- 
val is (Fy / Fy) times the length of the original interval. 
mae endipoints of the interwal contamming the mianamenmmears 


the relabeled, x? 


; and’ x. The prdgess™is then repeawer 


using the following general formulas. 


ae eal i 

SS = Py-y-40%27*1) | 
PNeL-i 

: eee, tt 

oi F PN-i ee 
Eee 


moo ot} = 1,,...,N-1. The final two points would, by the above 
memmulas, coincide and thus should be offset by some small 
e. The length of the final iterval to which the minimum is 


restricted is: 


Crs + €. 


ie 





Thus the accuracy to which the minimum is found depends 
Mpenmethe size of the original imterval and the number of 
function evaluations to be made. By the nature of Fibonacci 
numbers it is»only necessary to make one function evaluation 
Men iteration after the initial iteration. 

ig DAVIES, SWANN AND CAMPEY'S SEARCH TECHNIQUE [3,9] 

In this method the function is approximated along the 
ime Dy a quadratic. If the three points used to locate 
the minimum are separated by an interval greater than some 
Specified size, the operation is repeated with a smaller 
moverval . 

Forwthis Method an initial step size as decided sug 
depending upon the estimated distance to the minimum from 
Mm@e ciirrent point. This step size should be about one 
fourth this estimated distance. The initial step is taken 
toward the minimum and the function is evaluated at this 
mw point. If the function has increased then cut the 
pep Size and begin again from the initial point. This as 
done until a point is found for which the function has de- 
creased. The step size is then doubled and a new step is 
taken from the latest point. This process is continued until 
a function increase is logated. Atethas time the’ cute 
mecp Size is cut in half and a step is taken fromathe base 
Peint at which the function decreased. The last four 
points found are thus equally spaced say, s units apart, 
and define an interval in which the minimum must lie. The 
end point of this interval furtherest from the point at 


which the function has the smallest value is discarded and 


vo 





the remaining three points are used to approximate the 


minimum. Let x,, x 


Fp and X7ebe erese three pomes queen 


im”, t>, and ££, be the respective function values at these 
Memnts. Let us assume a quadratic approximation for £f on 


mine line, 
f(t) =at? +ebt +.c, 


me eshach the values -1, 0, 1 for t correspond to values vat 
mee xX, «and xX, religpectively. 


aus , 


awe ( Sa eas £ wee 
fe -f V2 


oe 
1 


(aqe25, fe een 


a eee 


iteeminimum of f(t) is at t. = (f,-£,)/2(£,-2f,+f,). 
ie) point X corresponding te 1s ee to (%2-X1)- The 
function is then evaluated at thismilew point sand “hicwpne. 
Bess 15 Deen again with a reduced step size from tne spaume 
memwrch the function had the smallest value. Ihe process 
mes continued until the change in successive approxima@erons 
morte minimum 1s less than half the desired acceurdcy. 
o POWELL'S ALGORITHM [3,9] 

This method differs from the previous method only in 


the manner of selecting the three points from which to 


interpolate the minimum. Assuming x, is Me Inletal pore 
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mien X, = x, + S is selected where S is a fixed displace- 


ment. The third point, x,, is chosen as follows: 


a ecco if ff, >t, 


san omer ee < eee 


Using these three points another quadratic interpolation 
meaperformed as outlined above. If the new point differs 
eon the point where the function had its smallest value 
my less than the required accuracy Phe ELS seeiite 1s 
assumed to be the desired minimum. Otherwise the point 
at which the function is the largest is discarded and a 
mew interpolation is made using the remaining three peints. 
4. DAVIDON'S CUBIC INTERPOLATION 

Davidon uses a cubic interpolation based on values for 
mae tunction and its gradient at two points of the line [4, 
ips | . 

Assume that the value of the function and its gradient 
mee known at two points, X and y, where y = x + ad. This 


method calls for using a-cubic interpolation as follows. 


Let, 
f(s tobi sect + 7d OU ie 
where 
fio = £(0) = ad 
f(y) = f(a) = aa*® + ba* + ca + d 
: a Baa 
Vi (oets— 2 €3] = C 
Vfi(y)'s = ={2= = Sa0° + Zhe te 
a Esy dt i : 


ie 





Mrerefore, 


7 OO = 2(f,-£,))/a° 


aa (g.y% Six 

ce Gl, ae-ce .. - 22.) /a° 
© * &sx 

d = tL: 


Thus , 


oa 'S Sige cage + 2bie anc 


a 7 ee 
3((8.),0 + g_,0 (GE eae /a 


ma = = Z 
tat(3(f, i) ag 2g a) /a uae 


Sy 


= Poeece@eee SEO (Seer aE on 
a aX a <v ~y 


where B = OCS ese Cera Boy: 
from the quadratic formula it is found that the desired zero 


ms at 


ct 
il 


c 4 2. p2 _ 4 2 
m ? 7 Sean) aa eee + #28 .,,3) alee *BoxBsy*2B8 ox 


2 
qe \Bsx*8sy* 28) 


o(g.,*B+Q) 


(Ce sae ener) 


Sy 


where 


» 7 
Q = We  5x8sy 
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(85x*8 sy 2B) ~ (8oytB-Q) 


BsxtBsy*4® 


(gaat Bae 


te ita ance 
8x78 syt 28) 


S 


i 
Q 


And thus, 


: (8.42 B) 


ce “Te +20) ; CASS) 


tt 
R 


sy &sx 
samce the condttr1en for aeminimum along »S is that thegegn- 
memicnt of the gradient along S 1s zero, the above cawaunen 
fr- 5) gives an estimate form this minimum. Thus the eseumace 


meek SUCh that. £(x + kS) is a Minimum is: 


+ [te] 


mar the solution to this problem then 1t 15 omly necessan 


memensure that a reasonable choice 1S made for “a, 
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